<html>
<head>
<link rel=stylesheet href="style.css" type="text/css">
<title>collectl - Lustre</title>
</head>

<body>
<center><h1>Lustre</h1></center>
<p>
<h3>Overview</h3>
The first thing to understand about lustre reporting is in most cases, where one has configured
the server(s) and just wants to monitor them, all one need do is specify -sl or -sL and
collectl will do the right thing.
It will automatically detect the type of service(s) currently running and will
either record or display the appropriate data.
If you select -sl and the system doesn't have lustre installed, it will warn
you and then disable that switch.
<p>
<h3>Controlling Which Data is Displayed</h3>
It turns out that lustre records a wealth of performance data, far more than makes sense to
display all the time, and so by default collectl displays minimal information
such as bytes/operations read and written.  At the client detail level lustre
can differentiate this
data at the filesystem and even the OST level!  In order to accomodate the broadest flexibility
one is allowed to control the way data is collected/displayed via several complementary
switches.

<ul>
<li>-s:  As is normally the case, one can specify '-sl' for summary level data, '-sL' for
detail data or combine them to get both.
However, since the client detail data can actually be presented at the individual filesystem or 
OST level, a third option '-sLL' has been included to indicate OST level details, noting that 
'-sL' aggregates at the filesystem.</li>
<li>-O: The '-O' switch is used to provide further detail about the types of data that is to be
collected/displayed.  There are 4 such values that collectl cares about:
<ul>
<li>B - rpc buffer level data.</li>
<li>D - disk block statistics, which applies to both MDS and OSS servers.  One should also note
this is specific to HP SFS and this data is not available in the open source version.</li>
<li>M - client metadata (note that this was the default prior to collectl V1.6.2).</li>
<li>R - read_ahead statistics.  Unlike the other options, which generate a lot of data,
-OR may be used with brief mode.</li>
<p>
As it turns out, nothing is quite as simple as it seems and while the following
case is not typical, it needs to be addressed for completeness.  Since collectl allows one to
collect one set of data and to later display a different set,
consider what happens in one were
to collect multiple types of lustre data for an OST using -DB, but then just play back
the basic OST data which is collected without specifying -O.
By default, playback mode defaults
to the settings data was collected with and to change the display one needs to explicity
change those settings.  To meet this need, there are 3 additional values one can use with
-O, namely c, o and m to indicate on playback one wants to see the base data.  Natually
these can be combined with other valid values for -O as well for maximum flexibility.
</ul>
</ul>
<p>
In the spirit of letting the user display whatever they want to, collectl will allow one to
select multiple values for -O and it will try to display the results appropriately.
Perhaps the easiest thing to do is just experiment and in most cases you'll get what you're 
looking for.  There are a few combinations of -s and -O that do not make sense and if you 
choose one, you will be told.
<p>

<h3> What About Playback?</h3>
As is always the case with playback, unless otherwise told to do something else, collectl 
will playback its recorded data
based on the parameters selected for collection.  In other words, if you specify '-OBR'
in record mode, collectl will record both RPC buffer and read_ahead stats.  When you play the
data back, it will then display both as well.  However, you also have the option of specifying
-O, typically thought of as a collection-only switch, and it will force the output to what
you'd like it to be.  If you select a statistics type that hasn't been recorded,
that information will be displayed, but as zeros.
<p>
<h3>Recognizing Service Configuration Changes</h3>
In some cases lustre services may change after collectl starts. This includes
services starting and stopping as well as the configurations of those services
themselves changing.
For  example one might occasionally mount/umount different lustre filesystems on
a client.  Not to worry.  Collectl
periodically checks for configuration changes and automatically adjusts the data it collects
as well as anything it may be currently displaying.  If you know that the configuration will
be limited to only 1 or 2 possible services, you can reduce the overhead in checking for
those services by specifying a finite list with -L.  However, in most cases this extra
overhead is not enough to make it worth bothering with.
<p>
The frequency at which collectl checks for configuration changes is
controlled by the variable 'LustreConfigInt' in 'collectl.conf' and so can easily be
overriden, but this typically shouldn't be necessary.
It is also possible to specify this monitoring frequency via -L when collectl
is started.  One should note that the overhead in monitoring the state changes is related to
the complexity of the server and has been observed to be less than 0.1% when checked
every 10 seconds on a minimally configured server.
<p>
<h3>Changing the Default Recording/Display Behavior</h3>
There are some times when you want specific control over what data is recorded or
displayed rather than the default behavior.  This is typically the case when a system
is playing multiple roles by providing more than one service.  For example, if a system
has been configured as both an OST and a client, every time you run collectl you will collect
or display data about both and sometimes this is NOT what you want.  There may be other times
where you have developed some reports or graphs that expect data in a standard format and
you've collected a subset (or superset) of data.
<p>
To override this behavior of the lustre portion of the data (remember you can
control the displaying of
individual subsystems with -s), use -L to specify the type of services you're
interested
in and collectl will only pay attention to those, both for recording to a file as well as
display.  When in recording mode, this will also limit the types of configuration changes
collectl will watch for too.  One should also note with -L it is possible to collect data
for some services and later display data for a different set.  Naturally when displaying
data for services you never collectled data on, those services will print as zeros.
<p>
If all this sounds confusing, just experiment with various combinations of -s, -L and -O
and observe the behavior.
<p>
<h3>Known Problems</h3>
In the process of enhancing lustre data collection in V1.5.0, it was discovered
that collectl was not as robust as it should have been with respect to the
identification of data files.  As a result, -sLL may capture erroneous data which
cannot be played back with either the version that collected it or any newer versions.
<p>
The number of reads/writes of for lustre client side data is wrong!  The number KB's
read and written when using the -sLL switch are reported as zero.  This is a problem
with lustre and has been reported.

</body>
</html>
